Distilling Monolingual Models from Large Multilingual Transformers
نویسندگان
چکیده
Although language modeling has been trending upwards steadily, models available for low-resourced languages are limited to large multilingual such as mBERT and XLM-RoBERTa, which come with significant overheads deployment vis-à-vis their model size, inference speeds, etc. We attempt tackle this problem by proposing a novel methodology apply knowledge distillation techniques filter language-specific information from into small, fast monolingual that can often outperform the teacher model. demonstrate viability of on two downstream tasks each six languages. further dive possible modifications basic setup exploring ideas tune final vocabulary distilled models. Lastly, we perform detailed ablation study understand different components better find out what works best under-resourced languages, Swahili Slovene.
منابع مشابه
Multilingual vs. Monolingual User Models for Personalized Multilingual Information Retrieval
This paper demonstrates that a user of multilingual search has different interests depending on the language used, and that the user model should reflect this. To demonstrate this phenomenon, the paper proposes and evaluates a set of result re-ranking algorithms based on various user model representations.
متن کاملMultilingual Versus Monolingual WSD
Although it is generally agreed that Word Sense Disambiguation (WSD) is an application dependent task, the great majority of the efforts has aimed at the development of WSD systems without considering their application. We argue that this strategy is not appropriate, since some aspects, such as the sense repository and the disambiguation process itself, vary according to the application. Taking...
متن کاملMultilingual Aspects of Monolingual Corpora
If someone would collect opinions among the computational linguists what had been the most important trend in linguistics in the last decade, it is highly probable that the majority would answer that it was the massive use of large natural language corpora in many linguistic fields. The concept of collecting large amounts of written or spoken natural language data has become extremely important...
متن کاملMultilingual Approach to e-Learning from a Monolingual Perspective
This paper describes the efforts undertaken in an international research project LT4eL from the perspective of one of the participating languages, Czech. The project aims at exploiting language technologies for adding new functionalities to an open source Learning Management System ILIAS. The new functionalities are based both on existing and newly developed tools for all languages involved. Th...
متن کاملDistilling Intractable Generative Models
A generative model’s partition function is typically expressed as an intractable multi-dimensional integral, whose approximation presents a challenge to numerical and Monte Carlo integration. In this work, we propose a new estimation method for intractable partition functions, based on distilling an intractable generative model into a tractable approximation thereof, and using the latter for pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2023
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics12041022